The Influence of 16PF Personality Questions on Happiness: Exploring Correlations with the World Happiness Report's Ladder ScoreΒΆ

Table of ContentsΒΆ

  • Introduction
  • Methods
  • Results
  • Discussion and Conclusions

IntroductionΒΆ

Happiness is often regarded as a subjective and multifaceted construct. It has captured the attention of researchers for centuries by artists, researchers and individuals who want to answer this question for themselves. Understanding the factors that contribute to one's happiness has been a pursuit spanning across various disciplines, including psychology, sociology, and philosophy.

In this project work, we explore the intersection of personality assessments and happiness metrics by employing the 16PF (16 Personality Factors) questionnaire and its potential influence on the World Happiness Report's Ladder Score. The 16PF questionnaire, developed by Raymond Cattell, offers a nuanced perspective on individual differences across key personality traits. On the other hand, the World Happiness Report's Ladder Score provides a broad-scale evaluation of subjective well-being across nations.

The primary objective of this project work is an explorative approach, namely to investigate the correlations between the responses to specific 16PF questions and the Ladder Score from the World Happiness Report. By analyzing the relationship between these personality factors and perceived happiness, we aim to shed light on potential connections, nuances, and implications for understanding overall well-being.

MethodsΒΆ

Data sources and preliminary workΒΆ

InΒ [Β ]:
%pip install plotly
%pip install pandas
%pip install 'SQLAlchemy==1.4.46'
%pip install nbformat
%pip install statsmodels
%pip install bs4
%pip install tabulate

Data exploration / data pipelineΒΆ

InΒ [Β ]:
import pandas as pd

# Load the data into two new dataframes from the sqlite database.
personality_df_sql = pd.read_sql_table('personality', 'sqlite:///../project.sqlite')
worldhappiness_df_sql = pd.read_sql_table('worldhappiness', 'sqlite:///../project.sqlite')
InΒ [Β ]:
## Create code-question-dict from codebook.html

from bs4 import BeautifulSoup

# HTML content of the table
with open('../data/personality/16PF/codebook.html', 'r') as file:
    html_content = file.read()

# Create a BeautifulSoup object
soup = BeautifulSoup(html_content, 'html.parser')

# Find all the table rows
table_rows = soup.find_all('tr')

# Create a dictionary to store the code-question pairs
code_question_dict = {}

# Iterate over the table rows starting from the second row
for row in table_rows[1:163]:
    # Find the code and question elements
    code_element,_, question_element = row.find_all('td')
    
    # Extract the code and question text
    code = code_element.text.strip()
    question = question_element.text.strip()

    question = question.split('"')[1]

    
    # Store the code-question pair in the dictionary
    code_question_dict[code] = question
    
# Save the dictionary as a csv file using pandas
pd.DataFrame.from_dict(code_question_dict, orient='index', columns=['question']).to_csv('../data/code_question.csv', header=False)
InΒ [Β ]:
# Merge the dataframes on the country_code column.
merged_df = pd.merge(personality_df_sql, worldhappiness_df_sql, on='country_code', how='inner')

# Calculate correlations
correlations = {}
first_column = merged_df.columns[-1]

for column in merged_df.columns[1:-5]:
    correlation_value = merged_df[first_column].corr(merged_df[column])
    correlations[f"{column}"] = correlation_value

# Get the absolute value of the correlations.
correlations_dict = {k: v for k, v in correlations.items()}

# Give a list of all dicts and for each value, give the information in which group it is. Save this in a dataframe.
correlations_list = []
for k, v in correlations_dict.items():
    if 0.3 <= abs(v) <= 0.5:
        group = 'medium'
    elif abs(v) >= 0.5:
        group = 'high'
    else:
        group = 'low'
    correlations_list.append({'correlation': group, 'direction': 'direct' if v > 0 else 'inverse', 'value': v, 'abs_value': abs(v), 'question_code': k, })
correlations_df = pd.DataFrame(correlations_list, columns=[ 'correlation', 'direction', 'value', 'abs_value', 'question_code'])

# Get a new column with the question text.
code_question_df = pd.read_csv('../data/code_question.csv', header=None)
code_question_df.columns = ['code', 'question']
correlations_df = pd.merge(correlations_df, code_question_df, left_on='question_code', right_on='code', how='inner')

# Sort the dataframe by the absolute value of the correlation.
correlations_df = correlations_df.sort_values(by='abs_value', ascending=False)

ResultsΒΆ

In the following section, we look at the results of our approach. Let us start off with exploring some characteristics of the correlations.

First of all, let's look at the correlation dataframe. As a reminder: We measured the correlation between the countries' ladder scores (between 1 and 10) and how the countries answered the respective questions of the 16PF questionnaire (between 1-5). The country-value for the questions is calculated as the mean of the answers of all participants of a country.

InΒ [Β ]:
correlations_df
Out[Β ]:
correlation direction value abs_value question_code code question
55 high indirect -0.623084 0.623084 F3 F3 I believe in one true religion
148 high direct 0.520874 0.520874 O6 O6 I am not bothered by messy people
53 high indirect -0.510531 0.510531 F1 F1 I believe laws should be strictly enforced
52 high indirect -0.505590 0.505590 E10 E10 I dislike loud music
70 medium indirect -0.499895 0.499895 G8 G8 I have little to say
... ... ... ... ... ... ... ...
44 low direct 0.004793 0.004793 E2 E2 I love large parties
86 low direct 0.004746 0.004746 I4 I4 I distrust people
99 low direct 0.004740 0.004740 J7 J7 I do unexpected things
139 low indirect -0.002114 0.002114 N7 N7 I enjoy my privacy
63 low indirect -0.001295 0.001295 G1 G1 I feel comfortable around people

162 rows Γ— 7 columns

The top 10 highest correlated questions:

InΒ [Β ]:
correlations_df.head(10)
Out[Β ]:
correlation direction value abs_value question_code code question
55 high indirect -0.623084 0.623084 F3 F3 I believe in one true religion
148 high direct 0.520874 0.520874 O6 O6 I am not bothered by messy people
53 high indirect -0.510531 0.510531 F1 F1 I believe laws should be strictly enforced
52 high indirect -0.505590 0.505590 E10 E10 I dislike loud music
70 medium indirect -0.499895 0.499895 G8 G8 I have little to say
132 medium indirect -0.494390 0.494390 M10 M10 I try to avoid complex people
154 medium indirect -0.491003 0.491003 P2 P2 I get angry easily
57 medium indirect -0.478769 0.478769 F5 F5 I like to stand during the national anthem
60 medium direct 0.466960 0.466960 F8 F8 I use swear words
81 medium indirect -0.464070 0.464070 H9 H9 I dislike works of fiction

The bottom 10 lowest correlated questions:

InΒ [Β ]:
correlations_df.tail(10)
Out[Β ]:
correlation direction value abs_value question_code code question
83 low indirect -0.008536 0.008536 I1 I1 I find it hard to forgive others
90 low direct 0.008515 0.008515 I8 I8 I trust others
48 low indirect -0.007176 0.007176 E6 E6 I act wild and crazy
121 low indirect -0.006998 0.006998 L9 L9 I am not easily bothered by things
50 low indirect -0.005396 0.005396 E8 E8 I don't like crowded events
44 low direct 0.004793 0.004793 E2 E2 I love large parties
86 low direct 0.004746 0.004746 I4 I4 I distrust people
99 low direct 0.004740 0.004740 J7 J7 I do unexpected things
139 low indirect -0.002114 0.002114 N7 N7 I enjoy my privacy
63 low indirect -0.001295 0.001295 G1 G1 I feel comfortable around people

We are shown the correlation, the direction of correlation (direct if the correlation value is positive, else indirect), the correlation value itself, the absolute value, the question code and the question text.

InΒ [Β ]:
# Count the number of high, medium and low correlations.
correlations_df['correlation'].value_counts()
Out[Β ]:
low       127
medium     31
high        4
Name: correlation, dtype: int64

We get that while the majority of correlations is low, we also have 31 medium correlations and 4 high correlations. The following cell shows us the top 4 correlations, when sorted by absolute value.

The following questions yielded high correlation:

InΒ [Β ]:
high_correlations_df = correlations_df[correlations_df['correlation'] == 'high']
high_correlations_df['question']
Out[Β ]:
55                 I believe in one true religion
148             I am not bothered by messy people
53     I believe laws should be strictly enforced
52                           I dislike loud music
Name: question, dtype: object

Let's look at scatter plots of the question with highest correlation to the ladder score ('I believe in one true religion') and the question with the lowest correlation to the ladder score ('I feel comfortable around people'). We included a Ordinary Least Squares trendline in order to see how the linear trend reflects in the data. By hovering over each data point, you can see the respective country.

InΒ [Β ]:
import plotly.express as px
import plotly.io as pio
pio.renderers.default = 'notebook'

# Get the question for the code "F3" for the xaxis label.
question = correlations_df.loc[correlations_df['question_code'] == 'F3', 'question'].iloc[0]

# Also include country name and country code when hovering over the data points.
fig = px.scatter(merged_df, x="F3", y="Ladder score", trendline="ols", hover_data=['country_code','Country name'])
fig.update_layout(xaxis_title=question + ' (1: Strong no, 5: Strong yes)')
fig.show()

We observe the indirect, high correlation between the degree of agreeing to the question and the ladder score. The R^2-value (Bestimmtheitsmaß) of the linear regression is approx. 0.39, which is the square of the correlation value of approx. 0.623. This means that 39% of the variance in the data ladder score can be explained by the responses to the question. In this case, the range and variance of countries' (aggregated) reponses to this question is relatively high, ranging from 1.5 to 4.5. The countries that agreed the strongest were

  • Pakistan
  • Honduras
  • Indonesia
  • Saudi-Arabia
  • Egypt,

whereas the least agreeing countries were:

  • Sweden
  • Denmark
  • Belgium
  • Netherlands
  • Russia,

which is coherent with other observations about countries' religiosity.

InΒ [Β ]:
# Get the question for the code "G1" for the xaxis label.
question = correlations_df.loc[correlations_df['question_code'] == 'G1', 'question'].iloc[0]

# Also include country name and country code when hovering over the data points.
fig = px.scatter(merged_df, x="G1", y="Ladder score", trendline="ols", hover_data=['country_code','Country name'])
fig.update_layout(xaxis_title=question + ' (1: Strong no, 5: Strong yes)')
fig.show()

We observe the very low correlation between the degree of agreeing to the question and the ladder score. The R^2-value in this case is <<0.001 and shows that the variance in the ladder score is not at all explained by the reponse to this question. We also note that the variance in countries' (aggregated) reponses to this question is relatively low, ranging from about 2.8 to 3.8.

Discussion and ConclusionsΒΆ

Disclaimer: The interpretations made in this section are not to be taken too seriously and/or scientifically.

In this section, we discuss the results from above. Again, for each country, we first calculated the mean of all responses to that question in order to obtain a measure of a country's degree of agreeing with the questions.

  • Independent variable: For a given country, the mean response to a question as described above.
  • Dependent variable: For a given country, the ladder-score.

For each question of the questionnaire, we then calculated the correlation and performed a simple linear regression. Results showed 4 questions with high correlations:

  • I believe in one true religion (-)
  • I am not bothered by messy people (+)
  • I believe laws should be strictly enforced (-)
  • I dislike loud music (-),

where (+) indicates a positive correlation, and (-) indicates a negative correlation.


Before we investigate the big picture, let us look at why the first question might display a high correlation.

The question with highest correlation was

"I believe in one true religion"

It showed a high, inverse proportionality, meaning that if a country's average response to that question was to strongly disagree, it showed higher ladder scores. There could be several psychological and sociological explanations for this. Some possible directions are

  • Believing in one true religion might lead to a sense of exclusivity, causing judgment or isolation from those who don't share the same beliefs, which can impact social connections and support.
  • A rigid belief in one true religion may limit openness to other perspectives or new ideas, potentially hindering personal growth and adaptability.
  • There could also be an indirect relationship: Weak-economy countries are traditionally more inclined to follow one exclusive religion, so that the religiosity of the country is just a proxy to the economic state of a country which actually determines life satisfaction.

Looking at the common, underlying trait among the correlations above, we notice that higher ladder scores correlate with reponses that go in the direction of open-mindedness and tolerance. They reflect an attitude of acceptance toward diversity in beliefs (not having one true religion), behaviors (being unbothered by messy people), governance (not strictly enforcing laws), and preferences (liking loud music). These attitudes often align with a broader perspective that accommodates various perspectives and differences without judgment or strict adherence to conventional norms.


When looking at the next 5 highest-correlated questions,

  • I have little to say (-)
  • I try to avoid complex people (-)
  • I get angry easily (-)
  • I like to stand during the national anthem (-)
  • I use swear words (+)
  • I dislike works of fiction (-)

we can still, overall, observe that a country's disposition towards openness and non-conformity to societal norms is correlated with higher ladder scores. This includes being comfortable with complexity in people, having a higher threshold for anger, not adhering strictly to patriotic or nationalistic rituals (like standing during the national anthem), being more liberal with language (using swear words), and having an appreciation for the world of fiction. This set of traits suggests an inclination towards individuality, a willingness to challenge norms, and a more relaxed approach to conventions typically expected in social settings. The first question, "I have little to say", aligns with the broader theme of openness of communication and expression.


However, when looking at the bottom 10 questions with regard to the correlation, this interpretation reaches its limits:

  • I find it hard to forgive others.
  • I trust others.
  • I act wild and crazy.
  • I am not easily bothered by things.
  • I don't like crowded events.
  • I love large parties.
  • I distrust people.
  • I do unexpected things.
  • I enjoy my privacy.
  • I feel comfortable around people.

For all of these questions, correlation was below .01. If these particular characteristics, such as difficulty forgiving others, trust in others, preferences for social settings, and other listed traits, don't impact life satisfaction, it indicates that the initial assumption connecting certain positive traits to higher life satisfaction might not be fully supported by this set of traits alone. This could imply that the relationship between these particular traits and life satisfaction might be more nuanced or dependent on other factors not captured in the initial analysis. Understanding the complexity and interactions between various personality traits and their impact on wellbeing requires a more comprehensive examination, considering additional variables and contexts beyond the presented statements.


Conclusions:

  • Correlations is not the same as causality.

Outlook:

  • Could be performed on an individual level, not on country-aggregated level.